{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 音乐网站用户流失预测 -- 生成索引表\n",
    "\n",
    "### 数据集说明\n",
    "\n",
    "项目提供KKBOX用户——歌曲重复播放记录，以及用户和歌曲的元数据。训练数据由2017年2月服务到期的用户构成，target标签代表用户在2017年3月是否续订了业务。测试集中的数据由2017年3月内将到期的用户构成，需要预测用户是否在到期后的一个月内即2017年4月预定、流失的概率。\n",
    "\n",
    "以下是文件及字段说明：\n",
    "\n",
    "1. train.csv: 训练数据，共7,377,418条记录 \n",
    "\n",
    "    msno: 用户id，加密String\n",
    "\n",
    "    song_id: song id，歌曲id\n",
    "\n",
    "    source_system_tab: 触发事件的类型/tab，用于表示app的功能类型\n",
    "\n",
    "    source_screen_name: 用户看到的布局的名字（name of the layout）\n",
    "\n",
    "    source_type: 用户在app上播放音乐的入口的类型\n",
    "\n",
    "    target: 标签。1表示用户在第一次听音乐后会在一个月内继续订阅，0表示没有订阅。\n",
    "\n",
    "2. test.csv ：测试数据，共2,556,790条记录 \n",
    "\n",
    "    id: id (用于结果提交)\n",
    "\n",
    "    msno: 用户id\n",
    "\n",
    "    song_id: 歌曲id\n",
    "\n",
    "    source_system_tab: 触发事件的类型/tab，用于表示app的功能类型\n",
    "\n",
    "    source_screen_name: 用户看到的布局的名字（name of the layout）\n",
    "\n",
    "    source_type: 用户在app上播放音乐的入口的类型\n",
    "\n",
    "3. sampleSubmission.csv：提交结果文件样例 \n",
    "\n",
    "    提交测试结果包含两个字段，分别为测试样本id及其标签为1的概率，格式如下：\n",
    "\n",
    "    id,target\n",
    "    \n",
    "    2,0.3\n",
    "    \n",
    "    5,0.1\n",
    "    \n",
    "    6,1\n",
    "    \n",
    "    etc.\n",
    "\n",
    "4. songs.csv：歌曲元数据信息，用unicode编码 \n",
    "\n",
    "    song_id：歌曲id\n",
    "\n",
    "    song_length: 单位为ms\n",
    "\n",
    "    genre_ids: genre 类别. 可多选，用 “|“隔开\n",
    "\n",
    "    artist_name：歌手\n",
    "\n",
    "    composer：作曲\n",
    "\n",
    "    lyricist：作词\n",
    "\n",
    "    language：语言\n",
    "\n",
    "5. members.csv：用户元数据信息\n",
    "\n",
    "    msno：用户id\n",
    "\n",
    "    city：城市\n",
    "\n",
    "    bd: 年龄。注意：年龄数据有离群点\n",
    "\n",
    "    gender：性别\n",
    "\n",
    "    registered_via: 注册方式\n",
    "\n",
    "    registration_init_time: 注册时间，格式为%Y%m%d\n",
    "\n",
    "    expiration_date: 到期时间，格式为 %Y%m%d\n",
    "\n",
    "6. song_extra_infos.csv：歌曲额外的信息\n",
    "\n",
    "    song_id：歌曲id\n",
    "\n",
    "    song name ：歌曲名字\n",
    "\n",
    "    isrc – 国际标准音像制品编码(International Standard Recording Code )。理论上可用于歌曲id，但产生的ISR没有经过官方授权。因此ISRC中的信息，如国家代码和参考年份可能不正确。且多首歌曲可能共享共一个ISRC，因为一首歌曲的音像制可发行多次。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 导入工具包"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "#导入工具包\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "#数据到文件存储\n",
    "import pickle as cPickle"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 定义生成索引函数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate_index(unique_users, unique_items, user_item_output_filename, item_user_output_filename):\n",
    "    #建立用户和物品的索引表\n",
    "    #本数据集中user_id和item_id都已经是索引了,可以减1，将从1开始编码变成从0开始的编码\n",
    "    #下面的代码更通用，可对任意编码的用户和物品重新索引\n",
    "    users_index = dict()\n",
    "    items_index = dict()\n",
    "\n",
    "    # 按用户出现的顺序建立索引\n",
    "    for j, u in enumerate(unique_users):\n",
    "        users_index[u] = j\n",
    "\n",
    "    # 同理，重新编码物品索引字典    \n",
    "    for j, i in enumerate(unique_items):\n",
    "        items_index[i] = j\n",
    "\n",
    "    #保存用户索引表\n",
    "    cPickle.dump(users_index, open(dpath + user_item_output_filename, 'wb'))\n",
    "    #保存音乐索引表\n",
    "    cPickle.dump(items_index, open(dpath + item_user_output_filename, 'wb'))\n",
    "    return users_index, items_index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 读取特征工程之后的数据"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1、读取FE_Songs.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>song_id</th>\n",
       "      <th>genre_ids_matrix_0</th>\n",
       "      <th>genre_ids_matrix_1</th>\n",
       "      <th>genre_ids_matrix_2</th>\n",
       "      <th>genre_ids_matrix_3</th>\n",
       "      <th>genre_ids_matrix_4</th>\n",
       "      <th>genre_ids_matrix_5</th>\n",
       "      <th>genre_ids_matrix_6</th>\n",
       "      <th>genre_ids_matrix_7</th>\n",
       "      <th>genre_ids_matrix_8</th>\n",
       "      <th>...</th>\n",
       "      <th>language_-1.0</th>\n",
       "      <th>language_3.0</th>\n",
       "      <th>language_10.0</th>\n",
       "      <th>language_17.0</th>\n",
       "      <th>language_24.0</th>\n",
       "      <th>language_31.0</th>\n",
       "      <th>language_38.0</th>\n",
       "      <th>language_45.0</th>\n",
       "      <th>language_52.0</th>\n",
       "      <th>language_59.0</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>CXoTN1eb7AI+DntdU1vbcwGRV4SCIDxZu+YD8JP8r4E=</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>o0kFgae9QtnYgRkVPqLJwa05zIhRlUjfF7O1tDw0ZDU=</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>DwVvVurfpuz+XPuFvucclVQEyPqcpUkHR0ne1RQzPs0=</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>dKMBWoZyScdxSkihKG+Vf47nc18N9q4m58+b4e7dSSE=</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>W3bqWd3T+VeHFzHAUfARgW9AvVRaF4N5Yzm4Mr6Eo/o=</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 204 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                        song_id  genre_ids_matrix_0  \\\n",
       "0  CXoTN1eb7AI+DntdU1vbcwGRV4SCIDxZu+YD8JP8r4E=                   0   \n",
       "1  o0kFgae9QtnYgRkVPqLJwa05zIhRlUjfF7O1tDw0ZDU=                   0   \n",
       "2  DwVvVurfpuz+XPuFvucclVQEyPqcpUkHR0ne1RQzPs0=                   0   \n",
       "3  dKMBWoZyScdxSkihKG+Vf47nc18N9q4m58+b4e7dSSE=                   0   \n",
       "4  W3bqWd3T+VeHFzHAUfARgW9AvVRaF4N5Yzm4Mr6Eo/o=                   0   \n",
       "\n",
       "   genre_ids_matrix_1  genre_ids_matrix_2  genre_ids_matrix_3  \\\n",
       "0                   0                   0                   0   \n",
       "1                   0                   0                   0   \n",
       "2                   0                   0                   0   \n",
       "3                   0                   0                   0   \n",
       "4                   0                   0                   0   \n",
       "\n",
       "   genre_ids_matrix_4  genre_ids_matrix_5  genre_ids_matrix_6  \\\n",
       "0                   0                   0                   0   \n",
       "1                   0                   0                   0   \n",
       "2                   0                   0                   0   \n",
       "3                   0                   0                   0   \n",
       "4                   0                   0                   0   \n",
       "\n",
       "   genre_ids_matrix_7  genre_ids_matrix_8  ...  language_-1.0  language_3.0  \\\n",
       "0                   0                   0  ...              0             1   \n",
       "1                   0                   0  ...              0             0   \n",
       "2                   0                   0  ...              0             0   \n",
       "3                   0                   0  ...              0             1   \n",
       "4                   0                   0  ...              0             0   \n",
       "\n",
       "   language_10.0  language_17.0  language_24.0  language_31.0  language_38.0  \\\n",
       "0              0              0              0              0              0   \n",
       "1              0              0              0              1              0   \n",
       "2              0              0              0              1              0   \n",
       "3              0              0              0              0              0   \n",
       "4              0              0              0              0              0   \n",
       "\n",
       "   language_45.0  language_52.0  language_59.0  \n",
       "0              0              0              0  \n",
       "1              0              0              0  \n",
       "2              0              0              0  \n",
       "3              0              0              0  \n",
       "4              0              1              0  \n",
       "\n",
       "[5 rows x 204 columns]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "(2296833, 204)"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dpath = '../data/'\n",
    "\n",
    "fe_songs = pd.read_csv(dpath + 'LR_data/FE_Songs.csv')\n",
    "fe_songs.head()\n",
    "fe_songs.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2、读取FE_Members.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>msno</th>\n",
       "      <th>city_1</th>\n",
       "      <th>city_3</th>\n",
       "      <th>city_4</th>\n",
       "      <th>city_5</th>\n",
       "      <th>city_6</th>\n",
       "      <th>city_7</th>\n",
       "      <th>city_8</th>\n",
       "      <th>city_9</th>\n",
       "      <th>city_10</th>\n",
       "      <th>...</th>\n",
       "      <th>date_diff_bin_10</th>\n",
       "      <th>date_diff_bin_30</th>\n",
       "      <th>date_diff_bin_183</th>\n",
       "      <th>date_diff_bin_365</th>\n",
       "      <th>date_diff_bin_730</th>\n",
       "      <th>date_diff_bin_1095</th>\n",
       "      <th>date_diff_bin_1825</th>\n",
       "      <th>date_diff_bin_2555</th>\n",
       "      <th>date_diff_bin_3650</th>\n",
       "      <th>date_diff_bin_99999</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>XQxgAYj3klVKjR3oxPPXYYFp4soD4TuBghkhMTD4oTw=</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>UizsfmJb9mV54qE9hCYyU07Va97c0lCRLEQX3ae+ztM=</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>D8nEhsIOBSoE6VthTaqDX8U6lqjJ7dLdr72mOyLya2A=</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>mCuD+tZ1hERA/o5GPqk38e041J8ZsBaLcu7nGoIIvhI=</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>q4HRBfVSssAFS9iRfxWrohxuk9kCYMKjHOEagUMV6rQ=</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 42 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                           msno  city_1  city_3  city_4  \\\n",
       "0  XQxgAYj3klVKjR3oxPPXYYFp4soD4TuBghkhMTD4oTw=       1       0       0   \n",
       "1  UizsfmJb9mV54qE9hCYyU07Va97c0lCRLEQX3ae+ztM=       1       0       0   \n",
       "2  D8nEhsIOBSoE6VthTaqDX8U6lqjJ7dLdr72mOyLya2A=       1       0       0   \n",
       "3  mCuD+tZ1hERA/o5GPqk38e041J8ZsBaLcu7nGoIIvhI=       1       0       0   \n",
       "4  q4HRBfVSssAFS9iRfxWrohxuk9kCYMKjHOEagUMV6rQ=       1       0       0   \n",
       "\n",
       "   city_5  city_6  city_7  city_8  city_9  city_10  ...  date_diff_bin_10  \\\n",
       "0       0       0       0       0       0        0  ...                 0   \n",
       "1       0       0       0       0       0        0  ...                 0   \n",
       "2       0       0       0       0       0        0  ...                 0   \n",
       "3       0       0       0       0       0        0  ...                 0   \n",
       "4       0       0       0       0       0        0  ...                 0   \n",
       "\n",
       "   date_diff_bin_30  date_diff_bin_183  date_diff_bin_365  date_diff_bin_730  \\\n",
       "0                 0                  0                  0                  0   \n",
       "1                 0                  0                  0                  1   \n",
       "2                 0                  0                  0                  1   \n",
       "3                 0                  0                  0                  0   \n",
       "4                 0                  1                  0                  0   \n",
       "\n",
       "   date_diff_bin_1095  date_diff_bin_1825  date_diff_bin_2555  \\\n",
       "0                   0                   0                   1   \n",
       "1                   0                   0                   0   \n",
       "2                   0                   0                   0   \n",
       "3                   0                   0                   0   \n",
       "4                   0                   0                   0   \n",
       "\n",
       "   date_diff_bin_3650  date_diff_bin_99999  \n",
       "0                   0                    0  \n",
       "1                   0                    0  \n",
       "2                   0                    0  \n",
       "3                   0                    0  \n",
       "4                   0                    0  \n",
       "\n",
       "[5 rows x 42 columns]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "(34403, 42)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fe_members = pd.read_csv(dpath + 'LR_data/FE_Members.csv')\n",
    "fe_members.head()\n",
    "fe_members.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3、统计用户数和音乐数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Users:  34403\n",
      "Songs:  2296833\n"
     ]
    }
   ],
   "source": [
    "#统计总的用户数目和音乐数目\n",
    "unique_set_users = fe_members['msno'].unique() # unique()表示去重\n",
    "unique_set_items = fe_songs['song_id'].unique()\n",
    "\n",
    "n_set_users = unique_set_users.shape[0]\n",
    "n_set_items = unique_set_items.shape[0]\n",
    "\n",
    "print(\"Users: \", n_set_users)\n",
    "print(\"Songs: \", n_set_items)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4、建立用户和音乐索引表"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "user_index, item_index = generate_index(unique_set_users, unique_set_items, 'LR_data/Members_Index.pkl', 'LR_data/Songs_Index.pkl')\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
